STAR: an algorithm to Search for Tandem Approximate Repeats

نویسندگان

  • Olivier Delgrange
  • Eric Rivals
چکیده

MOTIVATION Tandem repeats consist in approximate and adjacent repetitions of a DNA motif. Such repeats account for large portions of eukaryotic genomes and have also been found in other life kingdoms. Owing to their polymorphism, tandem repeats have proven useful in genome cartography, forensic and population studies, etc. Nevertheless, they are not systematically detected nor annotated in genome projects. Partially because of this lack of data, their evolution is still poorly understood. RESULTS In this work, we design an exact algorithm to locate approximate tandem repeats (ATR) of a motif in a DNA sequence. Given a motif and a DNA sequence, our method named STAR, identifies all segments of the sequence that correspond to significant approximate tandem repetitions of the motif. In our model, an Exact Tandem Repeat (ETR) comes from the tandem duplication of the motif and an ATR derives from an ETR by a series of point mutations. An ATR can then be encoded as a number of duplications of the motif together with a list of mutations. Consequently, any sequence that is not an ATR cannot be encoded efficiently by this description, while a true ATR can. Our method uses the minimum description length criterion to identify which sequence segments are ATR. Our optimization procedure guarantees that STAR finds a combination of ATR that minimizes this criterion. AVAILABILITY for use at http://atgc.lirmm.fr/star

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new memetic algorithm for mitigating tandem automated guided vehicle system partitioning problem

Automated Guided Vehicle System (AGVS) provides the flexibility and automation demanded by Flexible Manufacturing System (FMS). However, with the growing concern on responsible management of resource use, it is crucial to manage these vehicles in an efficient way in order reduces travel time and controls conflicts and congestions. This paper presents the development process of a new Memetic Alg...

متن کامل

Tandem repeats over the edit distance

MOTIVATION A tandem repeat in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats occur in the genomes of both eukaryotic and prokaryotic organisms. They are important in numerous fields including disease diagnosis, mapping studies, human identity testing (DNA fingerprinting), sequence homology and population studies. Although tandem repea...

متن کامل

Solving the tandem AGV network design problem using tabu search: Cases of maximum workload and workload balance with fixed and non-fixed number of loops

A tandem AGV configuration connects all cells of a manufacturing area by means of non-overlapping, sin-gle-vehicle closed loops. Each loop has at least one additional P/D station, provided as an interface between adjacent loops. This study describes the development of three tabu search algorithms for the design of tandem AGV systems. The first algorithm was developed based on the basic definiti...

متن کامل

An Algorithm to Solve the Motif Alignment Problem for Approximate Nested Tandem Repeats

An approximate nested tandem repeat (NTR) in a string T is a complex repetitive structure consisting of many approximate copies of two substrings x and X ("motifs") interspersed with one another. NTRs fall into a class of repetitive structures broadly known as subrepeats. NTRs have been found in real DNA sequences and are expected to be important in evolutionary biology, both in understanding e...

متن کامل

Detection of Signiicant Patterns by Compression Algorithms : the Case of Approximate Tandem Repeats in Dna Sequences. Rivals

0 To whom the reprint requests should be sent. 2 Abstract We use compression algorithms to analyse genetic sequences. The basic idea is that a compression algorithm is associated with a property. The more a sequence is compressed by the algorithm, the more signiicant is the property for that sequence. Here we present an algorithm to detect a particular type of dosDNA (Deened Ordered Sequence-DN...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 20 16  شماره 

صفحات  -

تاریخ انتشار 2004